feat(datafusion): Implement IcebergWriteExec for DataFusion write support #1585

CTTY · 2025-08-06T22:54:36Z

Which issue does this PR close?

Closes Implement Writer Node: Spawn Iceberg writers and write the input data #1545
See the original draft PR: feat(datafusion): Support insert_into in IcebergTableProvider #1511

What changes are included in this PR?

Added IcebergWriteExec to write the input execution plan to parquet files, and returns serialized data files

Are these changes tested?

added ut

liurenjie1024

Thanks @CTTY for this pr, in generally look good! Just one minor nit.

crates/integrations/datafusion/src/physical_plan/write.rs

liurenjie1024 · 2025-08-11T09:48:13Z

crates/integrations/datafusion/src/physical_plan/write.rs

+}
+
+impl IcebergWriteExec {
+    pub fn new(table: Table, input: Arc<dyn ExecutionPlan>, schema: ArrowSchemaRef) -> Self {


Another point is that we should ensure that the input schema matches table's schema, otherwise we are doing schema evolution during write.

Columns nullability and field type would be checked within execute_input_stream when it's binding the Iceberg table schema to the input RecordBatch. So we don't need to worry about it now.

This may prevent us from doing any forms of schema evolution, but I think that's a separate issue

liurenjie1024

Thanks @CTTY for this pr, LGTM!

CTTY added 7 commits August 6, 2025 15:24

starting on work node

0f62a62

clean write

fa74d98

working writing

f05d572

working on clippy

7e6e8d5

use property to control target file size

bb721b4

minor write

7c153e4

write clean

1e87bd4

CTTY marked this pull request as ready for review August 7, 2025 00:23

liurenjie1024 reviewed Aug 8, 2025

View reviewed changes

crates/integrations/datafusion/src/physical_plan/write.rs Show resolved Hide resolved

CTTY added 3 commits August 8, 2025 10:05

Merge branch 'main' into ctty/df-write-node

92d7d98

fail for partitioned tables

24a34ef

a bug a day, keep bs away

6aa1d95

liurenjie1024 reviewed Aug 11, 2025

View reviewed changes

Merge branch 'main' into ctty/df-write-node

f1ffe9b

liurenjie1024 approved these changes Aug 12, 2025

View reviewed changes

liurenjie1024 merged commit bc469c3 into apache:main Aug 12, 2025
18 checks passed

CTTY deleted the ctty/df-write-node branch August 12, 2025 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(datafusion): Implement IcebergWriteExec for DataFusion write support #1585

feat(datafusion): Implement IcebergWriteExec for DataFusion write support #1585

Uh oh!

CTTY commented Aug 6, 2025 •

edited

Loading

Uh oh!

liurenjie1024 left a comment

Uh oh!

Uh oh!

liurenjie1024 Aug 11, 2025

Uh oh!

CTTY Aug 11, 2025

Uh oh!

liurenjie1024 left a comment

Uh oh!

Uh oh!

Uh oh!

feat(datafusion): Implement IcebergWriteExec for DataFusion write support #1585

feat(datafusion): Implement IcebergWriteExec for DataFusion write support #1585

Uh oh!

Conversation

CTTY commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

liurenjie1024 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

liurenjie1024 Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

CTTY Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

liurenjie1024 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

CTTY commented Aug 6, 2025 •

edited

Loading